CelltypeR: A flow cytometry pipeline to characterize single cells from brain organoids

Summary Motivated by the cellular heterogeneity in complex tissues, particularly in brain and induced pluripotent stem cell (iPSC)-derived brain models, we developed a complete workflow to reproducibly characterize cell types in complex tissues. Our approach combines a flow cytometry (FC) antibody panel with our computational pipeline CelltypeR, enabling dataset aligning, unsupervised clustering optimization, cell type annotating, and statistical comparisons. Applied to human iPSC derived midbrain organoids, it successfully identified the major brain cell types. We performed fluorescence-activated cell sorting of CelltypeR-defined astrocytes, radial glia, and neurons, exploring transcriptional states by single-cell RNA sequencing. Among the sorted neurons, we identified subgroups of dopamine neurons: one reminiscent of substantia nigra cells most vulnerable in Parkinson’s disease. Finally, we used our workflow to track cell types across a time course of organoid differentiation. Overall, our adaptable analysis framework provides a generalizable method for reproducibly identifying cell types across FC datasets in complex tissues.


Supplemental Figures
and STAR methods The data generated was cleaned up using FlowJo (version 10.6) (Becton-Dickinson Biosciences).A) A starting gate was used to select appropriate cell size (X: FSC-A, Y: SSC-A) to remove debris and select cells.Double clicking the gated cells will give the selected population.B) A second gate was used to discriminate doublets from the analysis (X: FSC-W, Y: FSC-H), adjust the x-axis (FSC-W) to log scale.(By clicking the 'T').Single cells are selected.C) Open the selected single cells and change the x axis to the FSC-A-LiveDead Fixable Aqua channel.Click the 'T' and select 'Customize Axis' to adjust the scale.D) Custom axis window with the histogram of the LiveDead Fixable Aqua channel.Adjust the "positive decades" (circled in red) until two peaks are seen.E) Two peaks are seen on the LiveDead Fixable Aqua histogram.Click apply.F) Finally, the last gate was used to remove dead cells from the analysis (X: LiveDead Fixable Aqua, Y: FCS-A) by drawing a gate around the unstained live cell population.After data cleanup, a new .fcsfile was generated within FlowJo and exported for analysis for each FC sample.   Pruszak 2007, 35 Pruszak 2009, 36 Sundberg 2009, 37 Yuan 2011, 38 Wang  35 Pruszak 2009, 36 Yuan 2011, 38 45 Barraud 2007, 41 Pruszak 2007, 35

Figure S1 : 1 .
Figure S1: Example images with IF characterization of cell types in hMO 9 months in culture.Related to Figure 1.Cryosection with immunostaining of iPSC lines AIW002-02 (left) and AJG001C (right).Whole hMO sections with merged signals are shown in the left panel, scale bars are all 250 µm.The area in the white rectangles is enlarged on the right with each staining shown separately, the scales bars shown in the merged images are all 50 µm.Nuclei are labelled with Hoechst and shown in blue.A, B) Tyrosine hydroxylase (TH) marker of DA neurons in red and neurofilament (NF) a general neuronal marker in green, indicate DA neurons are present in hMOs from both iPSC lines.C,D) FOXA2 a neural progenitor marker of DA neuron lineage is absent at this late stage (would be yellow).GIRK2 (green) is expressed in DA neurons of the SN and is present in the hMOs.E,F) Astrocyte markers S100b (green) and AQP4 (red) indicate astrocytes are present in the hMOs.Scales bars are 250 µm in the full hMO images and 25 µm in the zoomed images.

Figure S2 :
Figure S2: FlowJo FC data cleanup to obtain live single cells for analysis.Related to Figure 1 and STAR methods The data generated was cleaned up using FlowJo (version 10.6) (Becton-Dickinson Biosciences).A) A starting gate was used to select appropriate cell size (X: FSC-A, Y: SSC-A) to remove debris and select cells.Double clicking the gated cells will give the selected population.B) A second gate was used to discriminate doublets from the analysis (X: FSC-W, Y: FSC-H), adjust the x-axis (FSC-W) to log scale.(By clicking the 'T').Single cells are selected.C) Open the selected single cells and change the x axis to the FSC-A-LiveDead Fixable Aqua channel.Click the 'T' and select 'Customize Axis' to adjust the scale.D) Custom axis window with the histogram of the LiveDead Fixable Aqua channel.Adjust the "positive decades" (circled in red) until two peaks are seen.E) Two peaks are seen on the LiveDead Fixable Aqua histogram.Click apply.F) Finally, the last gate was used to remove dead cells from the analysis (X: LiveDead Fixable Aqua, Y: FCS-A) by drawing a gate around the unstained live cell population.After data cleanup, a new .fcsfile was generated within FlowJo and exported for analysis for each FC sample.

Figure S3 :
Figure S3: Expression of the 13 protein markers varies across cells in hMOs.Related to Figure 1.Protein expression levels measured by FC in a subset of cells from 9 different hMO samples.The three iPSC lines (3450, AIW002-02-02, AJG001-C4, two batches (A and B) and two different experiment days (1 = 06/03/2020, 2 = 17/03/2020) are indicated at the top.Samples were processed identically on each experiment day and the same tubes and dilutions of antibodies as well as the same Flow cytometer settings were applied, staining and acquisition was performed on two days (1 and 2).Batches A and B each contain the three iPSC lines and were initiated from the same iPSC cultures seeded 3 weeks apart.A random sampling of 200 cells are shown in the heatmap (each bar is a single cell).

Figure S4 :
Figure S4: UMAPs of 2D cell cultures and clusters.Related to Figure 2. A) UMAP showing cells split by the original culture type and coloured by the original culture type.B) UMAP of cells split by the original culture type and colour by the clusters identified by unsupervised Louvain network detection.Cells from 2D cultures were harmonized using peak alignment.FC measurements were acquired on two experimental days, astrocytes, DA NPCs and oligodendrocyte cultures used on both experiment days (1 = 06/03/2020, 2 = 17/03/2020).DA neurons, were measured on experiment day 1 and iPSC were measured on day 2. The data from both days were pooled and then cells were randomly down sampled to 10000 cells per culture type, n=50000 cells.All cell cultures are from the AIW002-02 iPSC line.

Figure S5 :
Figure S5: UMAPs of 2D cell cultures with clusters labelled.Related to Figure2.FC marker expression from the area under the curve of mean intensity per cell followed by normalization is labelled above each plot.The UMAPs are coloured intensity relative to the normalized expression value for each marker, each scale is indicated on the right of the corresponding UMAP.Louvain network detection cluster numbers are indicated on the UMAPs.FC measurements were acquired on two experimental days, astrocytes, DA NPCs and oligodendrocyte cultures used on both experiment days (1 = 06/03/2020, 2 = 17/03/2020).DA neurons, were measured on experiment day 1 and iPSC were measured on day 2. The data from both days were pooled and then cells were randomly down sampled to 10000 cells per culture type, n= 50000 cells.

Figure S6 :
Figure S6: Preprocessing of fsc files exported from FlowJo, sample alignment and reading into R. Related to Figure 3 and STAR methods.A) Samples are merged by concatenation and not transformed or aligned in preprocessing.Files can be saved at this level of processing and one can proceed with the rest of the CelltypeR workflow if desired.For individual 2D iPSC derived cell lines, processing was stopped at this step.B) The merged expression data is biexponentially transformed and aligned by shifting means to match peaks between samples.C) The merged, biexponentially transformed, and aligned data is reverse transformed removing the biexponential transformation.The full processing was applied for hMO samples to remove experimental variability.Left panel indicates the data processing performed.Density plots visualize the distribution of the FC intensity values.Merged samples and biexponentially transformed samples (A) show that there in the expression distributions between channels.The rows in the plots correspond to the different samples.Four example markers are displayed.The alignment shown in B is the result of shifting the two peaks toward the mean across samples performed by the harmonize function.The density plot in C shows the distribution of expression after alignment and reverse transformation back to a log distribution.The UMAPs visualize cells from the 9 hMO samples indicated by colour in the sample legend.Seurat was used to scale before PCA and UMAP dimensional reductions.A shows the merged data before transformation, B shows the transformed and aligned data, C shows the aligned data, reverse transformed.The UMAP in A compared to C shows the difference between plotting the merged data compared to the aligned data.Right panel indicates heatmap of the marker expression levels across each sample.Scale bars indicate the relative expression and are matched in the heatmap and UMAP plots.Normalized expression is plotted.The data shown is 9000 cells from each of 9 hMOs (accept AJG001C batch A, experiment day 2, n=1578 cells), three iPSC lines (AIW002-02, 3450, and AJG001C), from two batches (A and B), acquired on two experiment days (1 = 06/03/2020, 2 = 17/03/2020).

Figure S7 :
Figure S7: Comparison of different R thresholds for CAM to predict cell types.Related to Figure 3. A) Bar chart showing cell counts for assigned cell types and cells left unassigned with the significant R threshold of 0.553.B) Bar chart for cell counts excluding the unassigned cells for the R threshold of 0.553.C) Bar chart showing cell counts for assigned cell types and cells left unassigned with R threshold of 0.35.D) Bar chart for cell counts excluding the unassigned cells for the R threshold of 0.35.E) Bar chart showing cell counts for assigned cell types with R threshold of 0.1.All cells pass the threshold with this correlation threshold.F) Violin plot showing the max correlation coefficients grouped by the cell type with the max correlation coefficient.The R threshold of 0.1 is indicated by the horizonal line.The data shown is 9000 cells from each of 9 hMOs (accept AJG001C batch A, experiment day 2, n=1578 cells), three iPSC lines (AIW002-02, 3450, and AJG001C), from two batches (A and B), acquired on two experiment days (1 = 06/03/2020, 2 = 17/03/2020).

Figure S8 :
Figure S8: Some cells have high correlation with two cell types in the prediction reference matrix.Related to Figure 3. A) Box plot showing the max correlation coefficient for each cell type (cor1) and the second max correlation coefficient (cor2) for each cell.The cor2 value is not for a specific cell type, it is the second highest correlation value regardless of the cell type.B) Connected point plots from a subsampling of cells in each of the predicted cell types with cor1 matched to cor2.The cor1 values are for the indicated cell type and the cor2 values are corresponding to the second max value for each hMO cell.Horizontal lines indicate that the first and second highest R values are close to equivalent.The data shown is 9000 cells from each of 9 hMOs (accept AJG001C batch A, experiment day 2, n=1578 cells), three iPSC lines (AIW002-02, 3450, and AJG001C), from two batches (A and B), acquired on two experiment days (1 = 06/03/2020, 2 = 17/03/2020).

Figure S9 :
Figure S9: Pairs of cell types with close correlation coefficients to two cell types in the prediction reference matrix could indicate an intermediate cell type.Related to Figure 3. Connected point plots of pairs of predicted cell types with a difference between the first and second highest R values less than 0.05.Only correlation with R value greater than 0.553 were included.Only cell type pairs with more than 8 cells were included.The cell type with the max correlation coefficient is on the left (cor1, blue) and the cell type with the second max correlation coefficient (cor2, green) is on the right.The hMO pairs of correlations coefficients for a given hMO cell are joined by a black line.Cell type names are abbreviated as follows: oligodendrocytes (Oligo), OPC, NPC, RG, neural stem cells (stemlike).Cell types that are a continuum of differentiation, such as neural stem cells and NPCs, or NPCs and neurons have close R values, possibly indicating these cells are starting to express markers of differentiated cell types or retaining some earlier marker expression.The neuron-oligodendrocyte pairs are not a match of a cell type continuum; however, the expression profiles of these cell types overlap.The data shown is 9000 cells from each of 9 hMOs (accept AJG001C batch A, experiment day 2, n=1578 cells), three iPSC lines (AIW002-02, 3450, and AJG001C), from two batches (A and B), acquired on two experiment days (1 = 06/03/2020, 2 = 17/03/2020).

Figure S10 :
Figure S10: Visualization of antibody expression levels in the subset of hMO cells used for cell type annotation.Related to Figure 3. A) UMAPs of normalized expression for the indicated markers shown as intensity.B) UMAP pseudo coloured by cluster, generated by PCA followed by Louvain network detection.Cluster numbers are indicated.C) Heatmap of relative expression for each antibody in the panel grouped by cluster.The scale bar is indicated below.D) UMAP coloured by CAM predicted cell types with a R threshold of 0.553.Only cell types with > 60 cells are labelled, otherwise these cells are labelled as NA.E) UMAP coloured by CAM predicted cell types with a R threshold of 0.1.Only cell types with > 500 cells are labelled, otherwise these cells are labelled as NA.The data shown is 9000 cells from each of 9 hMOs (accept AJG001C batch A, experiment day 2, n=1578 cells), three iPSC lines (AIW002-02, 3450, and AJG001C), from two batches (A and B), acquired on two experiment days (1 = 06/03/2020, 2 = 17/03/2020).

Figure S11 :
Figure S11: Random Forest Model (RFM) trained with the annotated subset of hMO data.Related to Figure 4 and STAR methods.A) Confusion matrix showing the number of cells predicted to be in each cell type for a hidden group of cells from the annotated data.The y-axis shows the true label annotation in the subset of cells and the x-axis shows the predicted cell labels.When the x and y axis labels match, the cells are correctly predicted.The number of cells predicted are indicated in the squares.The scale also indicates the number of cells in each true label to predicted label match.B) MDS plot showing the contribution of antibodies to the prediction in the RFM.High Gini decrease (yaxis) and lower mean minimum depth (x-axis) indicate a greater importance to classification.C) Line graph showing the prediction error (y-axis) for different numbers of trees used in training the RFM (x-axis).Coloured lines correspond to each cell type.OOB is the overall error rate.A low error rate indicates a better prediction.Most cell types are predicted accurately, but radial glia2, radial glia3, and oligodendrocytes are not predicted well in the RFM.

Figure S12 :
Figure S12: Visualization of protein expression levels in the full nine sample hMO dataset for cell type annotation.Related to Figure 4. A) UMAPs of normalized expression for the indicated markers shown as intensity.B) UMAP pseudo coloured by cluster, generated by PCA followed by Louvain network detection.Cluster numbers are indicated.B) UMAP visualization of Seurat clusters.C) Heatmap protein expression per cluster.All cells from each of 9 hMOs from three iPSC lines (AIW002-02, 3450, and AJG001C), two batches (A and B), and acquired on two experiment days (1 = 06/03/2020, 2 = 17/03/2020).

Figure S13 :
Figure S13: Cell type annotation predictions from CAM, RFM and Seurat label transfer.Related to Figure 4. Left, bar charts with the counts of cells predicated as each cell type in each cluster.Right, UMAPs colours by predicted cell types.Top; CAM predictions with an R threshold for assignment of 0.35 and a double cell type threshold of max Rsecond max1 of less than 0.01.The results were then filtered to included only predicted cell types with over 200 cells.Middle, RFM predictions from the model trained on the subset of 9000 cells from each of 9 hMOs.Bottom, Seurat label transfer method using the Seurat object from the subset of cells as the reference data.The data shown are all cells from each of 9 hMOs from three iPSC lines (AIW002-02, 3450, and AJG001C), two batches (A and B), and acquired on two experiment days (1 = 06/03/2020, 2 = 17/03/2020).

Figure S14 :
Figure S14: Proportionality test of cell types between iPSC lines using permutation test.Related to Figure 4. A) Dot plot showing the results of a permutation ANOVA comparing the proportion of each cell type across all three iPSC lines.The cell types are shown on the y-axis sorted by the mean proportion.The x-axis shows the difference of each observed cell type proportion from the mean proportion of all three lines.The dots for significantly different cell type proportions are shown by outlined circles (as indicated in the legend).B-D) Point range plots showing significant differences in proportions of cell types between iPSC lines using a two-condition permutation test.The iPSC lines compared are indicated above the plots.Pink dots indicate a significant difference in the proportion of the indicated cell type between the two iPSC lines with an adjusted p-value (FDR > 0.05) and log fold change in proportion > 0.58, indicated in the legend.The cell types are on the y-axis sorted by log fold change.B) AIW002-02 compared to AJG001-C4.C) AIW002-02 compared to 3450.D) AJG001-C4 compared to 3450.The data shown are all cells from each of 9 hMOs from three iPSC lines (AIW002-02, 3450, and AJG001C), two batches (A and B), and acquired on two experiment days (1 = 06/03/2020, 2 = 17/03/2020).

Figure S15 :
Figure S15: Proportionality test of cell types between AIW002-02 batches.related to Figure 5. A) Dot plot showing the results of a permutation ANOVA comparing the proportion of each cell type across four batches of AIW002-02 hMOs.The cell types are shown on the y-axis sorted by the mean proportion.The x-axis shows the difference of each observed cell type proportion from the mean proportion of all four batches.No significant differences are observed B) Point range plots showing significant differences in proportions of cell types between hMO batches using two condition permutation tests.The line contrasts are indicated above the plots.Pink dots indicate a significant difference in the proportion of the indicated cell type between the two batches.Samples are not treated as replicates by the permutation tests.Replicates are as follows Batch A has 2 replicates on separate experiment days (n cells = 35833), Batch B has one replicate (n cells = 53012), Batch C has 5 replicates, 2 replicates on one experiment day and 3 replicates on a second day (n cells = 141940), Batch D has 2 replicates on one experiment day (n cells = 60458).

Figure S16 :
Figure S16: Correlation between scRNAseq and FC marker expression values in four hMO populations sorted populations.Related to Figure 6.A) Scatter plots with FC expression on the x-axis and RNA expression on the yaxis for each marker indicated by colour and split by sorted cell type population.The correlation values are indicated on the plots.The lines are the slot of the correlation coefficients.B) Scatter plots with FC expression on the x-axis and RNA expression on the y-axis for each sorted cell type and split by markers.Values are sorted cell population samples split for FC and scRNAseq analysis.The lines are the slot of the correlation coefficient.The sorted populations are from dissociations of AIW002-02 hMOs, 9 months old from one batch labelled with the antibody panel and FC sorted using gates with values determined using hypergate.

Figure S17 :
Figure S17: Neuronal subtypes and selected markers from scRNAseq of FACS hMO populations.Related to Figure 6.A) Neurons were subset from the total population and plotted on a UMAP.Subtypes based are differentially expressed genes are indicated in the legend.B) Dot plot of 5 selected differentially expressed genes.The proportion of cells in each group expressing a marker is indicated by dot size and the expression level is shown by intensity.Cells are from AIW002-02 hMO batch C. scRNAseq from the four sorted samples were integrated, clustered, and annotated for main cell types.The neuron populations identified by scRNAseq transcriptomes was subset and clustered to identify neuronal subtypes.DGE was calculated between neuron clusters to identify cluster subtype markers.

Figure S18 :
Figure S18: DA neuronal subtypes and selected markers from scRNAseq of FACS hMO populations.Related to Figure6.A) DA neurons were subset from the total population and plotted on a UMAP.Subtypes were identified by clustering using Louvain network detection.Clusters of DA neuron subtypes were annotated using the markers identified by DGE and analysis of expression of known markers and subtypes DANeurons-VTA (ventral tegmental area), DANeurons-VM (ventral midbrain) and DANeurons-SN (substantia nigra).See TableS13and S14.B) Dot plot of 5 selected differentially expressed genes.The proportion of cells in each group expressing a marker is indicated by dot size and the expression level is shown by intensity.Cells from AIW002-02 hMO batch C. ScRNAseq from the four sorted samples were previously integrated, clustered, and annotated for main cell types.

Figure S19 :
Figure S19: Astrocyte subtypes and selected markers from scRNAseq of FACS hMO populations.Related to Figure 6.A) Astrocytes were subset from the total population and plotted on a UMAP.Subtypes clusters were identified by Louvain network detection and annotated based are differentially expressed genes.B) Dot plot of 5 selected differentially expressed genes.The proportion of cells in each group expressing a marker is indicated by dot size and the expression level is shown by intensity.Cells are from AIW002-02 hMO batch C. scRNAseq from the four sorted samples were previously integrated, clustered, and annotated for main cell types.

Figure S20 :
Figure S20: Radial glia subtypes and selected markers from scRNAseq of FACS hMO populations.Related to Figure 6.A) Radial glia cells were subset from the total population and plotted on a UMAP.Subtypes of cells were identified by Louvain network detection and annotated based on differentially expressed genes.B) Dot plot of 5 selected differentially expressed genes in each subtype.The proportion of cells in each group expressing a marker is indicated by dot size and the expression level is shown by intensity.Cells are from AIW002-02 hMO batch C. scRNAseq from the four sorted samples were previously integrated, clustered, and annotated for main cell types.

Figure S21 :
Figure S21: Visualization of the number of cellular subtypes identified by scRNAseq transcriptomes in each FC sorted population.Related to Figure 6.A) UMAP split by sorted population and coloured by cell subtypes.B) Stacked bar chart showing the number of each non-DA neuron cell subtype in each sorted population.C) Stacked bar chart showing the number of each DA neuron cell subtype in each sorted population.D) Stacked bar chart showing the number of each astrocyte cell subtype in each sorted population.E) Stacked bar chart showing the number of each radial glia cell subtype in each sorted population.Cells are from AIW002-02 hMO batch C. ScRNAseq from the four sorted samples was integrated, clustered with Louvain network detection, and annotated for main cell types and subtypes.

Figure S22 :
Figure S22: Proportion of cell types and DA neuron subtypes in FACS populations neurons 1 compared to neurons 2. Related to Figure 6.A) Point range plot showing the differences in proportions of DA neuron subtype cells neurons1 and neurons 2. B) Point range plot showing the differences in proportions of cell types in neurons1 compared to neurons2.Negative log2FD values indicate a greater proportion of cells in neurons1 and positive log2FD values indicate a greater proportion of cells in neurons2.Pink dots indicate a significant difference.

Figure S23 :
Figure S23: Proportions of cell types in AIW002 over time.Related to Figure 7. A) Dot plot showing the results of a permutation ANOVA comparing the proportion of each cell type across four time points of AIW002-02 hMOs.The cell types are shown on the y-axis sorted by the mean proportion.The x-axis shows the difference of each observed cell type proportion from the mean proportion at each time point.All cell types differ significantly across time.B) Bar chart showing proportions of cell types in each sample.Cell types are coloured and shown in the legend.Sample are replicates grouped by the indicated type points.C) Line plots showing the proportion of each cell type over time.Dots are shown for each replicate, time points are shown on the x-axis.Each cell type is shown separately.Cell types are indicated in the legend and facet labels above each plot.All samples are AIW002-02 (batch E) with four experimental replicates per time point annotated using the CelltypeR workflow.Cells were down sampled to 2000 cells per sample, n=32000.

Table S2 : Main effect of iPSC line from a 2-way ANOVA and Tukey's posthoc test of the significant differences.
Significant differences with a p-value < 0.05 are highlighted in bold.P values are shown for the 2-way ANOVA main effects and the adjusted p Value from the Tukey's HSD post hoc tests.

Table S3 : Tukey's HSD test of the interaction effect between iPSC line and protein expression.
Showing all significant differences between iPSC pairs for a given protein marker.The significance threshold was considered p value < 0.05.Mean values across cells were taken for each sample in each cell type (n=3).Diff, indicates the difference between the mean values of the 3 replicates.

Table S4 :
Hypergate prediction of cell types.Astrocytes 1 and 2 are combined and radial glia 1 and 2 are combined.Neurons1, neurons2, NPCs, and oligodendrocytes are left as separate populations.The rest of the cell types were combined into one group labelled 'other'.

Table S6 : Correlation coefficient between protein intensity levels measured by FC and RNA expression measured by scRNAseq.
The four FACS sorted populations are indicated.The 13 proteins targeted in the FC antibody panel and the corresponding genes are used as the input.